AITopics | performance gap

Collaborating Authors

performance gap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Neural Information Processing SystemsJun-15-2026, 15:31:34 GMT

Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores using large language models (LLMs) for annotation, but LLM-generated labels still fall short of human-level quality. To address this problem, we propose the Annotation with Critical Thinking (ACT) data pipeline, where LLMs serve not only as annotators but also as judges to critically identify potential errors. Human effort is then directed towards reviewing only the most "suspicious" cases, significantly improving the human annotation efficiency. Our major contributions are as follows: (1) ACT is applicable to a wide range of domains, including natural language processing (NLP), computer vision (CV), and multimodal understanding, by leveraging multimodal-LLMs (MLLMs).

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Neural Information Processing SystemsJun-12-2026, 10:34:50 GMT

Vision-Language models (VLMs) show impressive abilities to answer questions on visual inputs (e.g., counting objects in an image), yet demonstrate higher accuracies when performing an analogous task on text (e.g., counting words in a text). We investigate this accuracy gap by identifying and comparing the circuits---the task-specific computational sub-graphs---in different modalities. We show that while circuits are largely disjoint between modalities, they implement relatively similar functionalities: the differences lie primarily in processing modality-specific data positions (an image or a text sequence). Zooming in on the image data representations, we observe they become aligned with the higher-performing analogous textual representations only towards later layers, too late in processing to effectively influence subsequent positions. To overcome this, we patch the representations of visual data tokens from later layers back into earlier layers. In experiments with multiple tasks and models, this simple intervention closes a third of the performance gap between the modalities, on average. Our analysis sheds light on the multi-modal performance gap in VLMs and suggests a training-free approach for reducing it.

artificial intelligence, name change, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.60)

Add feedback

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

Alvo, Matias, Russo, Daniel, Kanoria, Yash

arXiv.org Machine LearningMay-15-2026

We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality. On the other hand, differentiable simulation largely sidesteps these issues by backpropagating through a simulator, but the presence of discrete actions or non-smooth dynamics yields biased or uninformative gradients. To address this, we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while maintaining unbiasedness. We also show how problems with action discontinuities can be reformulated in hybrid form, further broadening its applicability. Empirically, HPO substantially outperforms PPO on inventory control and switched linear-quadratic regulator problems, with performance gaps increasing as the continuous action dimension grows. Finally, we characterize the structure of the mixed gradient, showing that its cross term -- which captures how continuous actions influence future discrete decisions -- becomes negligible near a discrete best response, thereby enabling approximate decentralized updates of the continuous and discrete components and reducing variance near optimality.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2605.14297

Genre: Research Report (0.82)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Revisit the Power of Vanilla Knowledge Distillation: from Small Scale to Large Scale Supplementary Material

Neural Information Processing SystemsApr-25-2026, 18:54:18 GMT

A.1 Details of "stronger recipe" In Table 1 of our main paper, we evaluate the impact of limited model capacity [1] and small-scale dataset by comparing the results of using "previous training recipe" and our "stronger recipe". We summarize the details of "stronger recipe" and present them in Table 13. Table 13: Stronger training strategy used for distillation. "B" and "C" represent strategies for training students on ImageNet-1K and CIFAR100, respectively. A.2 Numerical results In Figure 1 of our main paper, we present a comparison of performance gaps among vanilla KD and two logits-based baselines, i.e., DKD [2] and DIST [3], on two datasets of varying scales, to demonstrate the underestimation of vanilla KD on small-scale datasets.

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A hierarchical decomposition for explaining ML performance discrepancies

Neural Information Processing SystemsMar-22-2026, 19:08:03 GMT

Machine learning (ML) algorithms can often differ in performance across domains. Understanding why their performance differs is crucial for determining what types of interventions (e.g., algorithmic or operational) are most effective at closing the performance gaps. Aggregate decompositions express the total performance gap as the gap due to a shift in the feature distribution $p(X)$ plus the gap due to a shift in the outcome's conditional distribution $p(Y|X)$. While this coarse explanation is helpful for guiding root cause analyses, it provides limited details and can only suggest coarse fixes involving all variables in an ML system. Detailed decompositions quantify the importance of each variable to each term in the aggregate decomposition, which can provide a deeper understanding and suggest more targeted interventions. Although parametric methods exist for conducting a full hierarchical decomposition of an algorithm's performance gap at the aggregate and detailed levels, current nonparametric methods only cover parts of the hierarchy; many also require knowledge of the entire causal graph. We introduce a nonparametric hierarchical framework for explaining why the performance of an ML algorithm differs across domains, without requiring causal knowledge. Furthermore, we derive debiased, computationally-efficient estimators and statistical inference procedures to construct confidence intervals for the explanations.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.59)

Add feedback

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Neural Information Processing SystemsMar-21-2026, 14:24:32 GMT

In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues. Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input. This approach narrows the performance gap between merged and fine-tuned models and improves adaptability to heterogeneous data. Extensive experiments on $20$ datasets for both language and vision tasks demonstrate the effectiveness of our method, showing an average improvement of $28.34\%$ in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.59)
Information Technology > Artificial Intelligence > Natural Language (0.59)

Add feedback

A hierarchical decomposition for explaining ML performance discrepancies

Neural Information Processing SystemsFeb-18-2026, 13:04:19 GMT

Machine learning (ML) algorithms can often differ in performance across domains.

artificial intelligence, decomposition, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Mexico > Oaxaca (0.04)
North America > United States > Nebraska (0.04)
(4 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

cfaea3a519edf73c3a0480ae8f00bc4e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 05:17:04 GMT

artificial intelligence, gradient, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

95c6ae3f3393786203a4b6dcb9df1036-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 19:07:07 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Poland (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Proof Proof of Proposition 4.2 Proposition 4.2 The performance gap of evaluating policy profile (π, µ) and (π, π

Neural Information Processing SystemsFeb-16-2026, 06:36:44 GMT

Proof of Theorem 4.7 We first prove a Lemma. Theorem A.2. (Theorem 1 in [36]) Let ϵ = max Theorem 4.7 In a two-player game, suppose that According to Theorem A.2, we have J ( π, µ) J ( π, α) E CQL [20] puts regularization on the learning of Q function to penalize out-of-distribution actions. The CSP algorithm is illustrated in Algorithm 1. The proxy model is trained adversarially against our agent, therefore, we set the proxy's reward function to be the negative of our agent's reward. We show experiment details of the Maze example in this section.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback